Skip to content

Conversation

@StanFromIreland
Copy link
Member

@StanFromIreland StanFromIreland commented Sep 23, 2025

Copy link
Member

@ZeroIntensity ZeroIntensity left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's very likely that people are relying on this at this point.

'start' : 0, 'reason' : 'ordinal not in range'}),
(UnicodeDecodeError, ('ascii', bytearray(b'\xff'), 0, 1,
'ordinal not in range'), {},
{'args' : ('ascii', bytearray(b'\xff'), 0, 1,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is setting off some alarm bells for me. If we're changing tests like this, we're probably breaking something.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a bug fix, there is a mismatch:

>>> exc = UnicodeDecodeError('ascii', bytearray(b'\xff'), 0, 1,'ordinal not in range')
>>> exc.args # 1 is the object
('ascii', bytearray(b'\xff'), 0, 1, 'ordinal not in range')
>>> exc.object
b'\xff'
>>> 

@bedevere-app
Copy link

bedevere-app bot commented Sep 24, 2025

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

Copy link
Member

@picnixz picnixz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't change this. People might rely on this behavior and I didn't want to change this (see also #58038 (comment)).

In general .args contains the arguments that were passed at construction time. The same holds for other exceptions: https://docs.python.org/3/library/exceptions.html#BaseException.args.

@StanFromIreland StanFromIreland deleted the unicode-errors branch September 24, 2025 14:39
@StanFromIreland
Copy link
Member Author

Closed the issue.

@vstinner
Copy link
Member

I think it's very likely that people are relying on this at this point.

Well, the number of 3rd party codecs should be pretty low, and this change is a backward incompatible change on purpose: it fixes an old bug.

@picnixz
Copy link
Member

picnixz commented Sep 24, 2025

But it is not the documented behavior (or this should be re-documented for UnicodeError explicitly, or maybe it is documented elsewhere and I didn't find it (I am no more on my dev session)). Users should rely on the non-args value and instead access the named properties I think.

@vstinner
Copy link
Member

Users should rely on the non-args value and instead access the named properties I think.

I agree with that. But it doesn't prevent to fix exc.args member, an old Python bug.

args is not documented explicitly in UnicodeError: https://docs.python.org/dev/library/exceptions.html#UnicodeError

It's documented on the BaseException: https://docs.python.org/dev/library/exceptions.html#BaseException.args

@picnixz
Copy link
Member

picnixz commented Sep 25, 2025

Yes, but the fact that it says:

The tuple of arguments given to the exception constructor

makes me feel that we should not break this part of the contract. That is, args should contain the original arguments tuple, and it doesn't matter how we change the attributes.

OTOH, we could have maybe have an astuple() method that would return the args as one could expect. It'd be a new method though but changing the meaning of args is risky. Now, after re-reading the issue, I also spotted in the PEP the following:

Should further encoding errors occur, the encoder is allowed to reuse the exception object for the next call to the callback.

So, re-using the same exception object should be allowed. Thus, I wouldn't consider the current behavior an "old" bug.


Note that a similar argument could be said for OSError. We can change the errno but args is not updated. Also OSError.args only contains the two first arguments for backward compatibility (it's documented as such). I would rather recall the fact that re-using UnicodeError exceptions does not update args.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants